Generating the instructions

Now we need to consider the nitty-gritty details of just what instructions are generated for each [[instr]]. In early passes, we'll just need to know how many instructions are required (and that number may change from pass to pass, so it must be recomputed). In the last pass, the sizes are stable (by definition), so we can look at the sizes to see what instructions to generate.

We'll consider the [[instrs]] in groups, but first, here's the way we will structure things: «compute positions»= let «functions for computing sizes» val newsize = case inst of «cases for sizes to be computed» in if newsize > size then sizeref := newsize else (); gen(pos+(!sizeref) (* BUGS – was pos+size*),pcptr,rest) end «emit MIPS instructions»= let fun gen1() = gen(pos+size,pcptr,rest) (* generate the rest of the [[instr]]s *) open Bits open Opcodes «declare reserved registers [[tempreg]] and [[pcreg]]» «functions for emitting instructions» in case inst of «cases of instructions to be emitted» end @ When we get around to generating code, we may need to use a temporary register. For example, if we want to load into a register an immediate constant that won't fit into 16 bits, we will have to load the high-order part of the constant with [[lui]], then use [[addi]] to add then the low-order part. The MIPS assembler has a similar problem, and on page D-2 of the MIPS book we notice that register 1 is reserved for the use of the assembler. So we do the same.

We need to reserve a second register for use in pointing to the program counter. We will use register 31 because the [[bltzal]] instruction automatically sets register 31 to the PC. «declare reserved registers [[tempreg]] and [[pcreg]]»= val tempreg = 1 val pcreg = 31 @ Before showing the code for the actual instructions, we should point out that we have two different ways of emitting a long word. [[emitlong]] just splits the bits into two pieces for those cases when it's desirable to put a word into the memory image. [[split]] gives something that will load correctly when the high-order piece is loaded into a high-order halfword (using [[lui]]), and the low-order piece is sign-extended and then added to the high-order piece. This is the way we load immediate constants of more than sixteen bits. It is also useful for generating load or store instructions with offsets of more than sixteen bits: we [[lui]] the [[hi]] part and add it to the base regsiter, then use the [[lo]] part as an offset. «functions for emitting instructions»= fun emitlong i = emit(rshift(i,16), andb(i,65535)) (* emit one long word (no sign fiddling) *) fun split i = let val hi = rshift(i,16) and lo = andb(i,65535) in if lo<32768 then (hi,lo) else (hi+1, lo-65536) end

@ We begin implementing [[instrs]] by considering those that emit constants. String constants are padded with nulls out to a word boundary. Integer constants are just emitted with [[emitlong]]. «cases for sizes to be computed»= STRINGCONST s => Integer.div(String.length(s)+3,4) | EMITLONG _ => 1 «cases of instructions to be emitted»= STRINGCONST s => let val s' = s ^ "00000000" in gen1(emit_string (4*size) s') (* doesn't know Big vs Little-Endian *) end | EMITLONG i => gen1(emitlong i) @ Next consider the labels. A [[DEFINE]] should never reach this far, and [[EMITLAB]] is almost like an [[EMITLONG]]. «cases for sizes to be computed»= | DEFINE _ => ErrorMsg.impossible "generate code for DEFINE in mipscoder" | EMITLAB _ => 1 «cases of instructions to be emitted»= | DEFINE _ => gen1(ErrorMsg.impossible "generate code for DEFINE in mipscoder") | EMITLAB(i, ref d) => gen1(emitlong((d-pos)*4+i)) @ Now we have to start worrying about instructions with [[EA]] in them. The real difficulty these things present is that they may have an immediate operand that won't fit in 16 bits. So we'll need to get this large immediate operand into a register, sixteen bits at a time, and then do the operation on the register.

Since all of the arithmetic instructions have this difficulty, and since we can use them to implement the others, we'll start with those and catch up with the control-flow instructions later. @ [[SUB]], [[MULT]], [[DIV]], and [[MFLO]] all use registers only, so they are easy. The other arithmetic operations get treated exactly the same, so we'll use a function to compute the size. move this to follow the definition of [[arith]]? «cases for sizes to be computed»= | ADD(_, ea, _) => easize ea | AND(_, ea, _) => easize ea | OR (_, ea, _) => easize ea | XOR(_, ea, _) => easize ea | SUB _ => 1 | DIV (_,_) => 1 | MULT (_,_) => 1 | MFLO _ => 1 | MFHI _ => 1 @ Register operations take one instruction. Immediate operations take one instruction for 16 bit constants, and 3 for larger constants (since it costs two instructions to load a big immediate constant into a register). An immediate instruction with [[Immedlab l]] means that the operand is intended to be the machine address associated with that label. To compute that address, we need to add [[4*(l-pcptr)]] to the contents of register [[pcreg]] (which holds [[4*pcptr]]), put the results in a register, and operate on that register.

This tells us enough to compute the sizes. «functions for computing sizes»= fun easize (Direct _) = 1 | easize (Immed i) = if abs(i)<32768 then 1 else 3 | easize (Immedlab(ref lab)) = 1 + easize(Immed (4*(lab-(get pcptr)))) @ As we have seen, to implement any arithmetic operation, we need to know the register form and the sixteen-bit immediate form. We will also want the operator from [[instr]], since we do the large immediate via a recursive call. We'll set up a function, [[arith]], that does the job. «functions for emitting instructions»= fun arith (opr, rform, iform) = let fun ar (Reg op1, Direct (Reg op2), Reg result) = gen1(emit(rform(result,op1,op2))) | ar (Reg op1, Immed op2, Reg result) = (case size of 1 (* 16 bits *) => gen1(emit(iform(result,op1,op2))) | 3 (* 32 bits *) => gen(pos,pcptr, (ref 2, LDI_32(op2, Reg tempreg)):: (ref 1, opr(Reg op1, Direct(Reg tempreg), Reg result)):: rest) | _ => gen(ErrorMsg.impossible "bad size in arith Immed in mipscoder") ) | ar (Reg op1, Immedlab (ref op2), Reg result) = gen(pos, pcptr, (ref (size-1), ADD(Reg pcreg,Immed(4*(op2-(get pcptr))), Reg tempreg)):: (ref 1, opr(Reg op1, Direct(Reg tempreg), Reg result)):: rest) in ar end @ The generation itself may be a bit anticlimactic. The MIPS has no ``subtract immediate'' instruction, and [[SUB]] has a different type than the others, so we emit it directly. «cases of instructions to be emitted»= | ADD stuff => arith (ADD,add,addi) stuff | AND stuff => arith (AND,and',andi) stuff | OR stuff => arith (OR,or,ori) stuff | XOR stuff => arith (XOR,xor,xori) stuff | SUB (Reg op1, Reg op2, Reg result) => gen1(emit(sub(result,op1,op2))) | DIV (Reg op1, Reg op2) => gen1(emit(div(op1,op2))) | MULT(Reg op1, Reg op2) => gen1(emit(mult(op1,op2))) | MFLO(Reg result) => gen1(emit(mflo(result))) | MFHI(Reg result) => gen1(emit(mfhi(result))) @ Floating point arithmetic is pretty easy because we always do it in registers. We also support only one format, double precision. «cases for sizes to be computed»= | NEG_D _ => 1 | MUL_D _ => 1 | DIV_D _ => 1 | ADD_D _ => 1 | SUB_D _ => 1 @ When emitting instructions we have to remember the Mips instructions use result on the left, but the [[MIPSCODER]] signature requires result on the right. «cases of instructions to be emitted»= | NEG_D (Reg op1,Reg result) => gen1(emit(neg_fmt(D_fmt,result,op1))) «functions for emitting instructions»= fun float3double instruction (Reg op1,Reg op2,Reg result) = gen1(emit(instruction(D_fmt,result,op1,op2))) «cases of instructions to be emitted»= | MUL_D x => float3double mul_fmt x | DIV_D x => float3double div_fmt x | ADD_D x => float3double add_fmt x | SUB_D x => float3double sub_fmt x

@ We offer a separate [[MOVE]] instruction because of large immediate constants. It is always possible to do [[move(src,dest)]] by doing [[add(Reg 0,src,dest)]], but the general form [[add(Reg i, Immed c, dest)]] takes three instructions when [[c]] is a large constant (more than 16 bits). Rather than clutter up the code for [[add]] (and [[or]] and [[xor]]) by trying to recognize register 0, we provide [[move]] explicitly.

[[LDI_32]] takes care of the particular case in which we are loading a 32-bit immediate constant into a register. It dates from the bad old days before [[MOVE]], and it might be a good idea to remove it sometime. «functions for emitting instructions»= fun domove (Direct (Reg src), Reg dest) = gen1(emit(add(dest,src,0))) | domove (Immed src, Reg dest) = (case size of 1 (* 16 bits *) => gen1(emit(addi(dest,0,src))) | 2 (* 32 bits *) => gen(pos,pcptr,(ref 2, LDI_32(src, Reg dest))::rest) | _ => gen(ErrorMsg.impossible "bad size in domove Immed in mipscoder") ) | domove (Immedlab (ref src), Reg dest) = gen(pos, pcptr, (ref size, ADD(Reg pcreg,Immed(4*(src-(get pcptr))), Reg dest))::rest) @ Notice we use [[easize]] and not [[movesize]] in the third clause because when we reach this point the treatment of a [[MOVE]] is the same as that of an [[ADD]]. «functions for computing sizes»= fun movesize (Direct _) = 1 | movesize (Immed i) = if abs(i)<32768 then 1 else 2 | movesize (Immedlab(ref lab)) = easize(Immed (4*(lab-(get pcptr))))

«cases for sizes to be computed»= | MOVE (src,_) => movesize src | LDI_32 _ => 2 | LUI _ => 1 «cases of instructions to be emitted»= | MOVE stuff => domove stuff | LDI_32 (immedconst, Reg dest) => let val (hi,lo) = split immedconst in gen1(emit(lui(dest,hi));emit(addi(dest,dest,lo))) end | LUI (Reg dest,immed16) => gen1(emit(lui(dest,immed16)))

@ Now that we've done arithmetic, we can see how to do control flow without too much trouble. [[SLT]] can be treated just like an arithmetic operator. [[BEQ]] is simple if the address to which we branch is close enough. Otherwise we use the following sequence for [[BEQ(Reg op1, Reg op2, ref dest)]]:

        bne op1,op2,L
        ADD (Reg pcreg, Immed (4*(dest-pcptr)), Reg tempreg)
        jr tempreg
     L: ...
Notice we don't have to put a [[NOP]] in the delay slot of the [[bne]]. We don't need one after the jump unless we needed one after the original [[BEQ]], in which case one will be there. If the branch is taken, we're doing as well as we can. If the branch is not taken, we will have executed an [[add]] or [[lui]] in the delay slot of the [[bne]], but the results just get thrown away. «cases for sizes to be computed»= | SLT(_, ea, _) => easize ea | BEQ(_,_,_,ref dest) => if abs((pos+1)-dest) < 32768 then 1 (* single instruction *) else 2+easize (Immed (4*(dest-(get pcptr)))) | JUMP _ => 1 | SLT_D _ => 1 | SEQ_D _ => 1 | BCOP1(_,ref dest) => if abs((pos+1)-dest) < 32768 then 1 (* single instruction *) else 2+easize (Immed (4*(dest-(get pcptr)))) | NOP => 1 @ The implementation is as described, except we use a non-standard [[nop]]. There are many Mips instructions that have no effect, and the standard one is the word with all zeroes ([[sll 0,0,0]]). We use [[add]], adding 0 to 0 and store the result in 0, because it will be easy to distinguish from a data word that happens to be zero. «cases of instructions to be emitted»= | SLT stuff => arith (SLT,slt,slti) stuff | BEQ(b, Reg op1, Reg op2, ref dest) => if size = 1 then gen1(emit((if b then beq else bne)(op1,op2,dest-(pos+1)))) else gen(pos,pcptr, (ref 1, BEQ(not b, Reg op1, Reg op2, ref(pos+size))) ::(ref (size-2), ADD(Reg pcreg, Immed(4*(dest-(get pcptr))), Reg tempreg)) ::(ref 1, JUMP(Reg tempreg)) ::rest) | JUMP(Reg dest) => gen1(emit(jr(dest))) | SLT_D (Reg op1, Reg op2) => gen1(emit(c_lt(D_fmt,op1,op2))) | SEQ_D (Reg op1, Reg op2) => gen1(emit(c_seq(D_fmt,op1,op2))) | BCOP1(b, ref dest) => let fun bc1f offset = cop1(8,0,offset) fun bc1t offset = cop1(8,1,offset) in if size = 1 then gen1(emit((if b then bc1t else bc1f)(dest-(pos+1)))) else gen(pos,pcptr, (ref 1, BCOP1(not b, ref(pos+size))) ::(ref (size-2), ADD(Reg pcreg, Immed(4*(dest-(get pcptr))), Reg tempreg)) ::(ref 1, JUMP(Reg tempreg)) ::rest) end | NOP => gen1(emit(add(0,0,0))) (* one of the many MIPS no-ops *) @ Our next problem is to tackle load and store. The major difficulty is if the offset is too large to fit in sixteen bits; if so, we have to create a new base register. If we have [[Immedlab]], we do it as an offset from [[pcreg]]. «functions for emitting instructions»= fun memop(rform,Reg dest, Direct (Reg base), offset) = (case size of 1 => gen1(emit(rform(dest,offset,base))) | 3 => let val (hi,lo) = split offset in gen1(emit(lui(tempreg,hi)); (* tempreg = hi @« 16 *) emit(add(tempreg,base,tempreg));(* tempreg += base *) emit(rform(dest,lo,tempreg)) (* load dest,lo(tempreg) *) ) end | _ => gen1(ErrorMsg.impossible "bad size in memop Direct in mipscoder") ) | memop(rform,Reg dest, Immed address, offset) = (case size of 1 => gen1(emit(rform(dest,offset+address,0))) | 2 => let val (hi,lo) = split (offset+address) in gen1(emit(lui(tempreg,hi)); emit(rform(dest,lo,tempreg)) ) end | _ => gen1(ErrorMsg.impossible "bad size in memop Immed in mipscoder") ) | memop(rform,Reg dest, Immedlab (ref lab), offset) = memop(rform, Reg dest, Direct (Reg pcreg), offset+4*(lab - get pcptr)) @ The actual registers don't matter for computing sizes, and in fact the value of [[pcreg]] is not visible here, so we use an arbitrary register ([[Reg 0]]) to compute the size. «functions for computing sizes»= fun adrsize(_, Reg _, Direct _, offset) = if abs(offset)<32768 then 1 else 3 | adrsize(_, Reg _, Immed address, offset) = if abs(address+offset) < 32768 then 1 else 2 | adrsize(x, Reg dest, Immedlab (ref lab), offset) = adrsize(x, Reg dest, Direct (Reg 0 (* pcreg in code *) ), offset+4*(lab-(get pcptr))) «cases for sizes to be computed»= | LOAD x => adrsize x | STORE x => adrsize x «cases of instructions to be emitted»= | LOAD (Byte,dest,address,offset) => memop(lbu,dest,address,offset) | LOAD (Word,dest,address,offset) => memop(lw,dest,address,offset) | LOAD (Floating,dest,address,offset) => memop(lwc1,dest,address,offset) | STORE (Byte,dest,address,offset) => memop(sb,dest,address,offset) | STORE (Word,dest,address,offset) => memop(sw,dest,address,offset) | STORE (Floating,dest,address,offset) => memop(swc1,dest,address,offset) @ For the shift instructions, only register and immediate operands make sense. Immediate operands make sense if and only if they are representable in five bits. If everything is right, these are single instructions. «cases for sizes to be computed»= | SLL _ => 1 | SRA _ => 1 «cases of instructions to be emitted»= | SLL (Immed shamt, Reg op1, Reg result) => gen1( if (shamt >= 0 andalso shamt < 32) then emit(sll(result,op1,shamt)) else ErrorMsg.impossible ("bad sll shamt " ^ (Integer.makestring shamt) ^ " in mipscoder")) | SLL (Direct(Reg shamt), Reg op1, Reg result) => gen1(emit(sllv(result,op1,shamt))) | SLL (Immedlab _,_,_) => ErrorMsg.impossible "sll shamt is Immedlab in mipscoder" | SRA (Immed shamt, Reg op1, Reg result) => gen1( if (shamt >= 0 andalso shamt < 32) then emit(sra(result,op1,shamt)) else ErrorMsg.impossible ("bad sra shamt " ^ (Integer.makestring shamt) ^ " in mipscoder")) | SRA (Direct(Reg shamt), Reg op1, Reg result) => gen1(emit(srav(result,op1,shamt))) | SRA (Immedlab _,_,_) => ErrorMsg.impossible "sra shamt is Immedlab in mipscoder" @ Finally, comments are ignored, and marks (backpointers) are written into the instruction stream.

Comments are used by the front end to give diagnostics. In the bad old days we would have had two different [[MIPSCODER]]s, one which generated machine code (and ignored comments), and one which wrote out assembly code (and copied comments). Today we have just one, which means the rerouting of comments takes place at a much higher level. Look in [[cps/mipsglue.nw]]. «cases for sizes to be computed»= | COMMENT _ => 0 | MARK => 1 (* backpointer takes one word *) | BREAK _ => 1 (* break instruction *) @ Just for the record, here's the description of what a mark (backpointer) is. ``Take the byte address at which the mark resides and add 4, giving the byte address of the object following the mark. (That object is the marked object.) Subtract the byte address of the initial word that marks the start of this instruction stream. Now divide by 4, giving the distance in words between the beginning of the block and the marked object. Take that quantity and shift it left by multiplying by [[power_tags]], and indicate the result is a mark by adding the tag bits [[tag_backptr]] into the low order part.'' [[pos+1]] is exactly the required distance in words. «cases of instructions to be emitted»= | COMMENT _ => gen1() | MARK => gen1( let open System.Tags in emitlong((pos+1) * power_tags + tag_backptr) end) | BREAK n => gen1( if n < 0 orelse n > 32 then ErrorMsg.impossible "bad break code" else emit(break n)) @